AITopics | decoder input

Collaborating Authors

decoder input

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

Neural Information Processing SystemsDec-24-2025, 02:08:32 GMT

In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models powerful decoder' by applying uniformly random dropout to the decoder input.We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space.

adversarial approach, name change, training sequence vae, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Multi-Attribute Linguistic Tuning for Controlled Paraphrase Generation

Elgaar, Mohamed, Amiri, Hadi

arXiv.org Artificial IntelligenceOct-31-2024

We present a novel approach to paraphrase generation that enables precise control and fine-tuning of 40 linguistic attributes for English. Our model is an encoder-decoder architecture that takes as input a source sentence and desired linguistic attributes, and produces paraphrases of the source that satisfy the desired attributes. To guarantee high-quality outputs at inference time, our method is equipped with a quality control mechanism that gradually adjusts the embedding of linguistic attributes to find the nearest and most attainable configuration of desired attributes for paraphrase generation. We evaluate the effectiveness of our method by comparing it to recent controllable generation models. Experimental results demonstrate that the proposed model outperforms baselines in generating paraphrases that satisfy desired linguistic attributes.

augmentation, computational linguistic, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2410.24199

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(17 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs

Neural Information Processing SystemsOct-10-2024, 18:42:44 GMT

In principle, applying variational autoencoders (VAEs) to sequential data offers a method for controlled sequence generation, manipulation, and structured representation learning. However, training sequence VAEs is challenging: autoregressive decoders can often explain the data without utilizing the latent space, known as posterior collapse. To mitigate this, state-of-the-art models weaken' thepowerful decoder' by applying uniformly random dropout to the decoder input.We show theoretically that this removes pointwise mutual information provided by the decoder input, which is compensated for by utilizing the latent space. We then propose an adversarial training strategy to achieve information-based stochastic dropout. Compared to uniform dropout on standard text benchmark datasets, our targeted approach increases both sequence modeling performance and the information captured in the latent space.

adversarial approach, latent space, training sequence vae, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting

Sima, Qi, Zhang, Xinze, Bao, Yukun, Yang, Siyue, Shen, Liang

arXiv.org Artificial IntelligenceJun-13-2024

Abstract--Recurrent neural network-based sequence-tosequence models have been extensively applied for multi-stepahead time series forecasting. These models typically involve a decoder trained using either its previous forecasts or the actual observed values as the decoder inputs. However, relying on self-generated predictions can lead to the rapid accumulation of errors over multiple steps, while using the actual observations introduces exposure bias as these values are unavailable during the extrapolation stage. In this regard, this study proposes a novel training approach called reinforced decoder, which introduces auxiliary models to generate alternative decoder inputs that remain accessible when extrapolating. Additionally, a reinforcement learning algorithm is utilized to dynamically select the optimal inputs to improve accuracy. ULTI-STEP-AHEAD time series prediction, which involves extrapolating a sequence of future values based extrapolating process, i.e., feeding back the one-step-ahead on historical observations, plays a vital role in various realworld prediction to the decoder to predict the value at the next step. Accordingly, research efforts have been devoted to developing statistical some non-autoregressive architectures were proposed and machine learning techniques for multi-step-ahead time to obviate the error propagation issue [10], [16], [17].

agent, decoder, prediction, (17 more...)

arXiv.org Artificial Intelligence

2406.09643

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Renewable (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation

Zan, Changtong, Ding, Liang, Shen, Li, Lei, Yibin, Zhan, Yibing, Liu, Weifeng, Tao, Dacheng

arXiv.org Artificial IntelligenceSep-28-2023

Zero-shot translation (ZST), which is generally based on a multilingual neural machine translation model, aims to translate between unseen language pairs in training data. The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs, e.g., for English and for German. Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem (non-target language words exist in the generated translation) and, therefore, difficult to apply the current multilingual translation model to a broad range of zero-shot language scenarios. To understand when and why the navigation capabilities of language IDs are weakened, we compare two extreme decoder input cases in the ZST directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively visualizing the contextual word representations (CWRs) of these cases with teacher forcing, we show that 1) the CWRs of different languages are effectively distributed in separate regions when the sentence and ID are matched (ON setting), and 2) if the sentence and ID are unmatched (OFF setting), the CWRs of different languages are chaotically distributed. Our analyses suggest that although they work well in ideal ON settings, language IDs become fragile and lose their navigation ability when faced with off-target tokens, which commonly exist during inference but are rare in training scenarios. In response, we employ unlikelihood tuning on the negative (OFF) samples to minimize their probability such that the language IDs can discriminate between the on- and off-target tokens during training. Experiments spanning 40 ZST directions show that our method reduces the off-target ratio by -48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3% tuning cost.

aclanthology, language id, translation, (16 more...)

arXiv.org Artificial Intelligence

2309.16599

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

DePA: Improving Non-autoregressive Machine Translation with Dependency-Aware Decoder

Zhan, Jiaao, Chen, Qian, Chen, Boxing, Wang, Wen, Bai, Yu, Gao, Yang

arXiv.org Artificial IntelligenceAug-2-2023

Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models because NAT decoders do not depend on previous target tokens in the decoder input. We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input. First, we propose an autoregressive forward-backward pre-training phase before NAT training, which enables the NAT decoder to gradually learn bidirectional target dependencies for the final NAT training. Second, we transform the decoder input from the source language representation space to the target language representation space through a novel attentive transformation process, which enables the decoder to better capture target dependencies. DePA can be applied to any fully NAT models. Extensive experiments show that DePA consistently improves highly competitive and state-of-the-art fully NAT models on widely used WMT and IWSLT benchmarks by up to 1.88 BLEU gain, while maintaining the inference latency comparable to other fully NAT models.

dependency, nat model, translation, (12 more...)

arXiv.org Artificial Intelligence

2203.16266

Country:

Asia > Middle East > Iran (0.05)
Asia > China > Beijing > Beijing (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Enriching Non-Autoregressive Transformer with Syntactic and SemanticStructures for Neural Machine Translation

Liu, Ye, Wan, Yao, Zhang, Jian-Guo, Zhao, Wenting, Yu, Philip S.

arXiv.org Artificial IntelligenceJan-21-2021

The non-autoregressive models have boosted the efficiency of neural machine translation through parallelized decoding at the cost of effectiveness when comparing with the autoregressive counterparts. In this paper, we claim that the syntactic and semantic structures among natural language are critical for non-autoregressive machine translation and can further improve the performance. However, these structures are rarely considered in the existing non-autoregressive models. Inspired by this intuition, we propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer, for the task of neural machine translation. Moreover, we also consider the intermediate latent alignment within target sentences to better learn the long-term token dependencies. Experimental results on two real-world datasets (i.e., WMT14 En-De and WMT16 En-Ro) show that our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.

machine translation, transformer, translation, (15 more...)

arXiv.org Artificial Intelligence

2101.08942

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Non-autoregressive Transformer by Position Learning

Bao, Yu, Zhou, Hao, Feng, Jiangtao, Wang, Mingxuan, Huang, Shujian, Chen, Jiajun, LI, Lei

arXiv.org Artificial IntelligenceNov-24-2019

Non-autoregressive models are promising on various text generation tasks. Previous work hardly considers to explicitly model the positions of generated words. However, position modeling is an essential problem in non-autoregressive text generation. In this study, we propose PNAT, which incorporates positions as a latent variable into the text generative process. Experimental results show that PNAT achieves top results on machine translation and paraphrase generation tasks, outperforming several strong baselines.

machine translation, transformer, translation, (15 more...)

arXiv.org Artificial Intelligence

1911.10677

Country:

North America > United States (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Guo, Junliang, Tan, Xu, Xu, Linli, Qin, Tao, Chen, Enhong, Liu, Tie-Yan

arXiv.org Machine LearningNov-21-2019

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than $1$ BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than $10$ times) the inference process over AT baselines.

curriculum, decoder input, translation, (13 more...)

arXiv.org Machine Learning

1911.08717

Country: Asia > China > Anhui Province (0.04)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

How to code The Transformer in Pytorch – Towards Data Science

#artificialintelligenceOct-2-2018, 16:22:07 GMT

Could The Transformer be another nail in the coffin for RNNs? Doing away with the clunky for loops, it finds a way to allow whole sentences to simultaneously enter the network in batches. The miracle is that NLP can then reclaim the advantage of python's highly efficient linear algebra libraries. The time-saving is then spent deploying more layers into the model. So far it seems the result is faster convergence and better results.

artificial intelligence, decoder, machine learning, (19 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback